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The standard method of computing the mutual information between 
two stochastic processes with finite energy replaces the processes with their 
Fourier coefficients. This procedure is mathematically justified here 
for random signals iv t (ui) square-integrable in the product space I X w 
where t e [0, T] and w is an element of a probability space. A natural 
notion of the sigma field generated by iv, («) is presented and it is shown to 
coincide with the sigma field generated by the random Fourier coefficients 
of w,(u>) in any complete orthonormal system in L 2 [0, T]. This justifies 
the use of Fourier coefficients in mutual information computations. 

Capacity is calculated for finite and infinite-dimensional channels, where 
the output signal consists of a filter (general Hilbert- Schmidt operator) oper- 
ating on the input signal tuith additive Gaussian noise. The finite-dimen- 
sional optimal signal is obtained. In the infinite-dimensional case capacity 
can be approached arbitrarily closely with finite-dimensional inputs. The 
question of the existence of an infinite-dimensional signal which achieves 
capacity is considered. There are channels for which no signal achieves 
capacity. Some results are obtained when the noise coordinates are inde- 
pendent in the eigensystem of the filter. 

I. INTRODUCTION 

In this paper, we attack a general form of the classical problem of 
determining the capacity of a linear channel with additive noise. 
Structurally we have 

r,(«) = [ G(/,t)s t («)<*t +*,(«) (1) 

•'0 

where the random signals, noise [n,(u>)], input [s,(w)], and output 
[r,(o))] are all defined on ^ / ^ T. All signals as well as the kernel of 
the channel operator are assumed square integrable in the appropriate 
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product spaces. The noise process, the channel operator, and an average 
power restriction on s,(w) are assumed to be given. In Section III we 
begin by defining the capacity of a channel. Our definition is motivated 
by, but is not a special case of, the generalization of Shannon's notion 
of capacity that has been indicated by Kolmogorov. The argument for 
the naturalness of our definition is that any of the above processes can 
be replaced by their random Fourier coefficients from any expansion 
using complete orthonormal functions in L 2 [0, T]. We solve the above 
problem when n, (co) is Gaussian and independent of s, (co). In Section IV 
we show that for finite-dimensional inputs there always exists an s,(o>) 
for which capacity is achieved and we find it. The infinite-dimensional 
case is solved in Section V as a limit of finite-dimensional cases. 

II. FUNDAMENTALS 

Fundamental to the notion of capacity is the notion of mutual infor- 
mation. We begin with Kolmogorov's definition of the mutual infor- 
mation of two event cr-fields contained in a universal cr-field. Let & and (B 
denote two sub cr-fields of a cr-field S a in a probability space (12, S a , P). 
Let a and /3 denote arbitrary partitions of 12 into a finite number of & 
and (B measurable sets A and B. The mutual information 1(0,, (B) of 6, 
and (B is 

I(a, ffi) = sup £ E P(A n B) log, p ( AQ,ff • (2) 

a ,& Ate bip l \A)r\p) 

We define log = 0. This sum does not decrease as a and /3 are re- 
fined. It can be shown that I(d, (B) ^ with equality if and only if 
Ct and (B are independent. The nonnegativity and other important 
properties of / are presented in Ref. 1. 

Let EC be a measurable space with cr-field denoted by S). A function 
f («) from (12, £ , P) to SC for which each D e 2D has a preimage in S u 
is called a measurable function. 

Let T be an arbitrary index set and let E 1 denote the real line. Endow 

n, t r E l with the product topology and consider its measurable sets 

to be the smallest cr-field containing the topology. We are interested 

in measurable functions from £2 to TL ltT E l . For our purposes T is either 

countable or a real compact interval. 

Suppose £ and 77 are measurable functions from 12 to n, tr E 1 . Then 
by the mutual information of f and 77, /(£, 77), we mean the mutual 
information between the smallest cr-fields with respect to which £ and 77 
are measurable. We denote these respective cr-fields by CL^ and a, . 
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Let f(co) denote any measurable function from ft to IT /tr E 1 . We 
define the probability distribution P r of f (co). The domain of P r is the 
measurable sets in H ltT E 1 . Let Q be such a measurable set. Then 

P r (Q) = P{a>:f(o;)£Q}. (3) 

If £ and 77 are each mensurable functions from ft to n ttr , £"' and n (cr , E 1 
respectively then (£, rj) is a measurable function from ft to II, tr , E l X 
n, tr , i? 1 and its distribution function is denoted P f- , . It is called the 
joint distribution of £ and 77. We can now give an alternate definition 
of mutual information between £ and 77. Let 7(5) denote arbitrary 
partitions of n, <7 . 1 £' I (n, (r3 ^") into a finite number of measurable 
sets C(D). The mutual information /(£, 77) is 

Id, v) =supEE P £ ,(C X D) log P J'Ep ( j5 - (4) 

Recall that the inverse image under a measurable function of a cr-field 
is a o--field. So it becomes apparent that the two definitions for /(£, 77) 
are equivalent. 

We review without proof some fundamental propositions that will be 
of use to us later. The following is a result of work by I. M. Gelfand, 
A. M. Yaglom, and A. Perez. 

Theorem: If P{, is not absolutely continuous with respect to the product 
measure P { X P„ then /(£, 77) = 00. If P { „ is absolutely continuous with 
respect to P t X P, , then letting dP iv /d(P ( X P,) denote the Radon- 
Nikodym derivative of P £ , with respect to P f X P, we have 



m v) - f 



l()i 



rfPt, 



dP* ■ (5) 



diPfX 

Proof: See Ref. 2. 

Theorem: Let A be a linear transformation in a k-dimensional vector 

space and let £ be a k-dimensional random vector. Then 

lit n) ^ HAS, 77) (6) 

holds for any random vector 77, with equality if the transformation A is 
nonsingidar. 

Proof: See Ref. 3. 

Theorem: If /(£,£) < », then P f is purely atomic. 

Proof: See Ref. 4. 
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Theorem: Ij £ = (£, , £ 2 , ■ ■ ■)» ^ en 

I(£ iV ) = limj[fo , ••• ,&),,]. (7) 

n-*oo 

Proo/: See Ref. 4. 

III. MUTUAL INFORMATION BETWEEN TWO PROCESSES IN L 2 { (£2, Sq , P) X 

([0,T},L,m)} 

Let £, (to) be square integrable on t X co. We term £, (co) a stochastic 
process. Notice that it differs from the standard definition of a stochastic 
process in two ways. First, it is an equivalence class of equal almost 
everywhere functions in (I X co). Second, not all functions in the equiva- 
lence class are stochastic processes in the sense of Ref. 5.; that is, for 
each t we do not have a random variable but only for almost all t. 
We assume E{ £, (co) } = 0. By Schwarz's inequality and Fubini's theorem 
it follows that E[ £,, (co)£,,(co) } e L 2 (t X t). If Vt (co) and {", (co) are processes 
of the same type as £ ( (co), /b<(co), f t (co)] is not well denned since a, 
and G f are not well defined. Because of the central role of these processes 
in modeling random signals with a finite average power we make Ct, 
and (if and hence /[^(co), f,(co)] meaningful here. We need to appeal to 
the following: 

Theorem (F. Riesz): Let /„ converge in measure to /. Then there exists a 
subsequence f nk converging to / almost everywhere. 

Proof: See Ref. 6. 

Suppose /„ converges in mean square to /. Since convergence in mean 
square implies convergence in measure, the limit of the subsequence 
guaranteed by Riesz's theorem is / in the sense that the limit and / agree 
almost everywhere. This last comment is important since Kolmogorov 
has given examples of functions g which possess an orthogonal expansion 
g n converging in mean square to g, yet pointwise almost everywhere 
convergence does not occur. 

Unless stated otherwise all o--fields mentioned in the remainder of 
this section are assumed to be completed. The following new definition 
is the key to making £i(a>) meaningful in the information theory sense. 

Definition: By the <7-field a generated by £,(w) we mean the smallest 
a-field a satisfying £,(&>) is (0, G.) X ([0, T], L) measurable, where L 
is the sigma field of Lebesgue measurable subsets of [0, T). (This state- 
ment is definitive since £,(to) is (fi, S a ) X ([0, T], L) measurable and 
the intersection of cr-fields is a c-field.) 
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Proposition I: Suppose 

£ a,.(«)*,(0 = &(«) (8) 

in </ie ?nean square sense in the product space {where a,-(w) = J" £ ( (aj)0,(£) d£ 
a?id 0,-(O are orthonormal on [0, 7 1 ]). 7/ a(w) = [oi(w), a 2 (a>), ■ • •] £/ien 

a { = a . 

Prooj: Since the expansion converges to £,(«) in mean square in the 
product space, it converges in measure. By F. Riesz's theorem we can 
find ?i, < n 2 < • • • so that 

lim [a,(«M,(0 + • • • + a„*„(0] - &(•)■ (9) 

ftf-MO 

The sum and product of measurable functions is measurable so that 
each partial sum is (Q, d a ) X ([0, T], L) measurable. The limit of 
measurable functions is measurable so £,(o>) is (12, Q a ) X ([0, T],L) 
measurable. Thus d £ C G a • 

Next we project £,(«) on 0,(0 to get 

a,(«) = J f«(«)0.-(O d*. (10) 

By Fubini's theorem a,-(a>) is measurable with respect to every o--field 
(B for which £,(«) is (12, 0J) X ([0, 5P], L) measurable. But this is true for 
each i, so & a C G« • 

Proposition I is of paramount importance. In the sequel it enables 
us to replace £t(o>) by a(co) when computing mutual information. 

It would seem appropriate to express d f without reference to an 
expansion. The following proposition accomplishes this. However, our 
proof does resort to an expansion of £ ( (co). Because the proof is similar 
to the proof of proposition I, we omit it. 

Proposition II: Let {£?(w)} denote the class of junctions in £,(co). Then 
Ct £ is the smallest tr-field containing C\ a Ct £ = [Here we have the only ap- 
pearance of possibly noncompete a- fields (the Ctja)]. 

We can now define capacity of our noisy linear channel. Let S denote 
a finite average power restriction on s,(co). Then the capacity of the 
channel is defined as the supremum of 7[s, (w), r, (w)] where the supremum 
is over all s, (co) satisfying 

B\jf f'fc) dtj ^ S. (11) 

We say £,(0) is Gaussian if the linear functionals 
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r 



U"W)dt {<t>(t)tL 2 [0,T]\ 

are all Gaussian random variables. 

IV. THE FINITE-DIMENSIONAL CASE 

For a random variable 77 possessing density p n the quantity 

h(v) = -/p,logp, (12) 

arises often in mutual information studies. It is called the differential 
entropy of 77. 

The following theorem is proved in (Ref. 7). 

Theorem: Let p„ be the density of a k-dimensional random variable u. 
To maximize 



h(u) = -J Pu log Vu 



subject to the conditions that the mean and dispersion matrix have given 
values y and T, choose the normal density 

Q(u) - 2tt-* /2 I r |"« exp [i(u - V )'T-\u - »)], 

which satisfies the conditions. 

We prove a corollary necessary for the sequel. 

Corollary: Let p u be the density of a k-dimensional variable u. We want 
to choose p u to maximize 



/i(u) = — J p u log p u 



subject to the conditions that the mean is y and the dispersion matrix 
satisfies the constraint that its trace is less than or equal to ST. The solution 
is to choose p u to be Gaussian with mean y and covariance ST/kl, where 
I is the identity matrix. 

Proof: From the preceding theorem we only need to consider Gaussian 
densities. For a Gaussian density we can write the formula 

h(u) = I log 2xe + § log I T |. (13) 

Maximizing h(u) is equivalent to maximizing \ T \. Now by the geometric 



LINEAR CHANNEL CAPACITY 



87 



mean — arithmetic mean inequality 



r I < 



(trace 



H" 



(14) 



with equality if and only if 7 n = 722 = • • • 7a* • 

Now we are ready to consider the finite-dimensional version of the 
problem of finding the optimal power-restricted signal s,(«) which 
maximizes the mutual information between it and the output r,(w). 
By what we have shown in the previous section we can replace these 
processes by their Fourier coefficients when computing mutual infor- 
mation. More specifically let n,(u>) be a finite-dimensional Gaussian 
process of dimension k. Let G denote a nonsingular operator on E k and 
let s, (oj) be a ^-dimensional process that is independent of n, (u>). Suppose 
that the distribution of n,(u>) is absolutely continuous with respect to 
Lebesgue measure in E k . We want to find s, (w) such that its distribution 
is absolutely continuous with respect to Lebesgue measure in E k and 
I[s,(u>), Gs,(u) + n t (u)] is maximized subject to 



E 



f s;(u) dt ^ ST 

«'n 



Now by the theorem concerning linear transformations of random 
vectors stated earlier, I[8 t (fa), Gs t (co) + n,(w)] = 7[s,(w), s,(w) + 
G^ntffa)]. Define ij,(w) as ij t (w) = G~ l n,(u>) and let 



flk. 



and 



s = 



At 



be coordinates of 171(a)) and s t (u>). 
Then 

/[«,(«), s f (w) + vM] = I p.* + ,*..» lo g?T~^T 
Introducing the transformation 
into the above integral and using the fact that s* and 77* are independent 



S8 



THE BELL SYSTEM TECHNICAL JOURNAL, JANUARY 1970 



we have 
/(«*, s k + J) 



= / p.* + ,» log ^ + h(s k + ,') = -h( v k ) + h(s k + t,*). (15) 

Since h[r] t (u>)] is not a function of s,(o;), we have reduced the problem 
to that of maximizing h[s,((ji) + G -1 n,(a>)] subject to 



7? 



•'0 



ST 1 . 



Now 77/ (a>) is Gaussian and we know from the corollary stated earlier 
that h[s t (u)) + G~ l n,(u)] subject to the above constraint is maximized 
by a Gaussian process. Thus, without loss of generality, s r (co) can be 
assumed to be Gaussian. We seek r, the covariance of s,(w) so that 
| r, + G~ 1 T n (G~ 1 )' | is maximized, since this maximizes h[s t (co) -f- 
G~ l n t (co)]. Let us assume, without loss of generality, that G' 1 T n G~ v = r, 
is diagonal. Thus the problem is to maximize 



r. + 



Vi 



.0 







Vk. 



subject to T, a covariance matrix with trace (T.) ^ ST. Since we are 
maximizing a continuous function over a compact set, we know that 
the maximum exists. 

We use induction to show that the optimal T, is diagonal. For m = 1 
the statement is a trivial one. For m > 1 it shall be convenient to 
partition r, so that 



r. = 




Let 



r = r. + 



772 







Vtj 



Now using some standard results on determinants (see Ref. 8, p. 46), 
it follows that 
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r. + r , I = 



Tii + vi r' 
y r 



= (Tii + Vi) det r - (det r) Y T-' Y . 



(16) 

Note that both r and r _1 are positive definite. It is optimal to choose 
Y = to maximize the second term. The first term is also optimal if r is 
diagonal. This follows since any nondiagonal T with trace ^2 k iml fa = 
ST — 7n has determinant less than or equal to some diagonal matrix 
with trace equal to ST — y u by induction. We now optimally select 
the diagonal elements of T, . If an e > of ST is to be put on a diagonal 
element of r, , it is optimal to add it to min { (7,,- + 77,), i = 1, • • • , k] 
so that it will have the largest possible multiplier in the determinant of I\ 

V. CAPACITY FOR THE INFINITE-DIMENSIONAL CASE 

We turn to calculating the capacity in the situation where there can 
be an infinite number of Fourier coefficients of s, (co) and rj, (w) and where 
the channel G is an infinite-dimensional Hilbert-Schmidt operator. 

Define 

r 



G= f G(t, r) 



where G(t, t) t L 2 (t X t). Let {ft} be a complete set of orthonormal 
eigenfunctions for G * G and let {A,} be the associated eigenvalues. 
Define Gft = ft ; then (ft , ft) = (Gft , Gft) = (0,- , G * Gft) = X, S if 
and so (ft/XJ) is an orthonormal set. We use r and 77 to denote the 
infinite vector of Fourier coefficients of r,(«) and t?,(co) in the system 
{ftAt} while s denotes the infinite vector of Fourier coefficients of 
s,(cS) in the system {ft}. Let r* denote the first k coefficients of r and 
define s* and 77* similarly. Let D be the doubly infinite diagonal matrix 
with X* as the zth diagonal element and define D k to be the k X k sub- 
matrix of D with indices less than or equal to k. Then r = D6 + ij and 
r k = D k s k + v k . 

We first show that if an optimal input signal exists, then there is an 
optimal Gaussian input signal. We shall need the following lemma. 

Lemma: For any signal s, lim I(r', s') = I(r, s). 

Proof: We know that lim,- lim* /(?•', s k ) = I(r, §). As stated earlier 
this is proved in Ref. 4. Now 



/(/,!') = h(r') + fp r .. 5 , log ^ 
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where r* = D { $ 4 + y\ If j ^ i, 

J p ri;i log 21111 = J p n q>ii logp,. = J p,. logp,. . 

Thus /(?•', s') = 7(r', s') for j > i; then lim,- w /(r\ s') = /(/, s') = 
/(?•'', s). Finally lim,^ /(/', s) = lirn,-^ I(r\ s") = /(r, s). 

Alternately this lemma can be proved by extending some results of 
Ref. 3 to the infinite-dimensional case. 

We now show that if an optimal signal exists for the infinite-dimen- 
sional case, then the optimal signal can be assumed, without loss of 
generality, to be Gaussian. 

Proposition III: If s\ is a non-Gaussian optimal signal then s 2 , the 
Gaussian signal ivith the same covariance matrix as s t , is optimal. 

Proof: Clearly J(rJ , s k ) ^ I(r k 2 , s k ) for all k since the Gaussian process 
is optimal for a fixed covariance matrix in the finite-dimensional case. 
Thus 7(r, , s\) = lim* 7(rJ , si) ^ I(r 2 , s 2 ) - lim t I(r k , s 2 ). 

Proposition IV: The capacity of the infinite-dimensional channel is the 
limit of the capacities of the k dimerisional truncated approximation of the 
infinite-dimensional channel. 

Proof: Let C k denote the capacity of the k dimensional channel and let 
C = lim C k ■ We claim the capacity of the infinite-dimensional channel 
is C. It is evidently at least C. Suppose a signal s, (co) exists satisfying 
the constraints with mutual information, I(r, s) greater than C. Then 
/(/, s k ) ^ C k since s k satisfies the power constraint. Thus I(r, s) ^ C, 
a contradiction. 

Corollary 1 : There exist finite-dimensional signals whose resulting mutual 
information is arbitrarily close to the capacity. 

Corollary 2: If C k is constant for all k larger than some integer I, then 
the I + \-dimensional optimal signal is optimal for the infinite-dimen- 
sional case. 

5.1 Limiting Covariance Matrices and Optimal Signals When {77,} 7s 
Independent 

It is not always true that some input signal achieves capacity in the 
infinite-dimensional case. We first prove this. Then we study the special 
case when {t?,} is independent in the {^i/\\\ system. This case may be 
of marginal interest insofar as a model of a realistic system. However 
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it is mathematically tractable and hence serves as a good testing ground 
for intuition into more general behavior. 

We now show that no optimal input signal exists for the case X,- = 
1/f, Etf/h = 1. It is clear that C k = \ log (1 + ST/k) k . Then the 
capacity is: C = \ lim*-,* log (1 + ST/k) k = ST/2. If there exists an 
optimai signal s, /(?•*', s') — > ST/2. But /(;•'', I*) = \ log | IV + /, |, 
where /,- is an i X i identity matrix. Then I(r\ s') ^ \ 22-i 1°S (1+^*?) 
and lim 7(r\ s') ^ \ YZ-i ] °g (1 + #«/)• Recall that £?-i ^ = S71 
by assumption. We show that £?_ x log (1 + tfaj) < ST. Since £"s* ^ 
log (1 + ^s-) with equality if and only if Es) = 0, lim I(r', s') ^ 

\ Er-i ^g (i + *© < \ Er-i ^ = sr/2. 

Although an optimal signal does not always exist, we can say when 
it does exist in the special case when the {??,} are independent in the 
system {ypi/\\}. It will turn out that {fj*}, the sequence of finite- 
dimensional optimal covariance matrices for s, converges in some cases 
to an optimal solution and in other cases the limit is not optimal. The 
diagonal matrix with a, = (l/\,)2?ij< on the ith. diagonal element com- 
pletely determines whether or not an optimal limit is reached. 

We define the order of minima of a sequence {£,}f=i as follows. The 
order is 0.5 if no smallest element in { f<}~-i exists. If M\ is defined to be 
the set of smallest elements in { £,• }°° =1 and Card (M x ) = + °° , the order 
of the sequence is 1. If Card (il/j) < + « but the set (U {< - M x ) 
has no least element, the order is 1.5. If the set {VJ £, — ii/i} has M 2 
smallest elements and Card (M 2 ) = + oo , the order is 2. If Card (M 2 ) < 
+ oo but the set 

{vjfc - U M t 

has no least element then the order is 2.5, and so on. If the sequence 
is not assigned a finite order of minima, the order is infinite. 

If the order of minima of {a, } is 0.5, { IV} — » [0]. To see this we need 
only consider diagonal elements of f -,* . Suppose for some j and for 
some e > 0, E$* i= e in an infinite number of f S k . Since no smallest 
element in {a,-} exists, there are an infinite number of a, , say {a,--} 
smaller than a,- . Then in the optimal covariance matrices where Es) ^ e 
and the i' appear, Es\- ^ e. But this is not possible with the constraint 
2 Es] ^ ST. Thus for each j and e > 0, Es) < e in all but a finite 
number of IV . 

If the order of the minima of {a,} is 1, IV — * [0]. Tliis follows since 
it is optimal to put the power on the minima. After some k, only the 
minima will have positive Ex* . Since there are an infinite number of 
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them and the ST is optimally distributed equally on them, f ,-* — > [0J. 

If the order of the minima of {a,-} is 1.5, there are two cases. Let 
h - inf {U & - M x ] and <7 = inf {U £,}. If (h - <?) Card (M,) ^ S7\ 
there is an optimal solution as a limit consisting of Es\ = ST /Card (Af 
for those i corresponding to a, e M x . Otherwise the covergence is to a 
matrix where Es* = h — g if a,- « Jlf i and zero elsewhere, which is clearly 
not optimal. The analysis for other finite order systems are analogous 
to the above. Either (i) ST is distributed over a finite number of com- 
ponents, in which case the convergence is to an optimal solution, or 
(it) ST has to be distributed over an infinite number of components, in 
which case the convergence is not to an optimal solution. 

If the order is infinite and we run out of the quantity ST on a finite 
number of components, the resulting finite-dimensional solution is 
optimal. Suppose the order is infinite and we do not run out of S T on a 
finite number of dimensions. Let be the smallest accumulation point of 
{at}. If not all of ST is used in making 

x,- - 6 ' 

the limiting covariance is not optimal and no optimal covariance which 
achieves capacity may be constructed. This follows since a finite amount 
of ST must be distributed equally to an infinite number of components. 
If all of the ST is exactly used to make 

the limiting covariance is optimal. Before proving this we give an 
example of such a case. 



Let 



_ (i + 1) F , L_ 

X - ~ (i + l) 3 - 1 ' * Vi ~ (t + D s 



and assume the rji ar e independent. Then a, = 1 — \/{i + 1) . To 
bring all components 



to 1 we need 



EA 



CO J 

ST = § (TTT) 3 ' 



and we are then in the case considered above. 
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We now show that the limiting covariance matrix for the case when 
there is just enough ST to bring 

is optimal. Let r„ be the limiting matrix with the corresponding Gaus- 
sian process s, . Let ^ = D§i + -q. We show that I(r\ , s*) — > C, k — > °o . 
Now suppose s* is optimal for A'-dimensions. Then 

i(r k , §*) - i{r\ , ft m h(r h ) - h(r\) = I log [0 + ^J n X, 

- I log 0* fl X,. . (17) 

Here 5(/c) equals that part of ST not used in the matrix r,, in the first 
/c-dimensions. Clearly we are assuming that the smallest elements of a, 
appear first. Notice that 8(h) — > as k — > w , Then 



I(r k , s k ) - I(r k , si) = | log 



1 + £B 

r fee 



. f (18) 



for /c sufficiently large. Then 

lim /(/, s k ) = lim I(r\ , s\) = I(r % , s,) = C. (19) 

VI. SUMMARY 

Let us review what we have done. Since we chose to deal with signals 
£,(ui) square-integrable on L 2 {(fi, Sq , P) X ([0, T], L, m) }, we define 
the mutual information between two such signals using Proposition 
II and equation (2) in such a way that it agrees with the mutual infor- 
mation of their Fourier coefficients defined in equation (10). For the 
channel defined in equation (1) with input signals constrained by 
equation (11), we calculate the capacity of the channel. First in Section 
IV the capacity problem is considered when only a finite number of 
Fourier coefficients are nonzero. We use the corollary to the theorem in 
Section IV and equation (15) to show that only Gaussian signals have 
to be considered. Then equation (16) is used to calculate the finite- 
dimensional optimal signal by "filling the well." In Section V the case 
of an infinite number of nonzero Fourier coefficients is considered. 
We show in Proposition III that optimal signals, if they exist, can be 
chosen Gaussian. In Proposition IV the capacity of the infinite-dimen- 
sional channel is calculated as the limit of finite-dimensional capacities. 
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Finally in Section 5.1 we deal with, the existence of an optimal signal. 
In general no optimal signal exists. A special case is examined when the 
noise components are independent in a fixed coordinate system. 

APPENDIX 

Symbols Used 

The following is a list of symbols used throughout the text. 

L 2 — the set of square-integrable functions 
??,(w) — a noise process 
77, (co) — a noise process 
s,(u)) — the input signal process 
r, (<a) — the output process 

G — the linear channel operator 
G* — the adjoint of G 

P { — a probability measure generated by £ 
p n — the probability density of the random vector r? 
G, — a sigma field 
!(£> v) — the mutual information between £ and r? 
h(i}) — the differential entropy of 77 
r, — the covariance of s 
E k — Euclidean /c-space 
|r| — the determinant of T 
L — the Lebesgue measurable sets 
m — Lebesgue measure 
Card — Cardinality 
E — expected value 
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